- Dronjic, V. & Helms-Park, R. (2014). Fixed-choice word-association tasks as second-language lexical tests: What native-speaker performance reveals about their potential weaknesses. Applied Psycholinguistics, 35(1), 193-221.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Qian and Schedl's Depth of Vocabulary Knowledge Test was administered to 31 native-speaker undergraduates under an 'unconstrained' condition, in which the number of responses to headwords was unfixed, whereas a corresponding group (n = 36) completed the test under the original 'constrained' condition. Results revealed lower accuracy in the unconstrained condition and in paradigmatic versus syntagmatic responses. Native speakers failed to reach the 90% criterion on most unconstrained and many constrained items. Although certain modifications could improve such a test (e.g., eliminating psycholinguistically anomalous headwords, such as adjectives, or presenting responses to headwords discontinuously), two intransigent problems impede test validity. First, collocates in the mental lexicon differ in tightness and vary across dialects, sociolects, and age groups. Second, it is more serious that second-language Depth of Vocabulary Knowledge Tests are likely spot checks of metalinguistic knowledge rather than depth tests that reflect what learners would actually produce in spontaneous utterances. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Test Validity and Reliability, Mental Lexicon, Vocabulary Size, Language Tests
- Baker, B. A. (2012). Individual differences in rater decision-making style: An exploratory mixed-methods study. Language Assessment Quarterly, 9, 225-248.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Researchers of high-stakes, subjectively scored writing assessments have done much work to better understand the process that raters go through in applying a rating scale to a language performance to arrive at a score. However, there is still unexplained, systematic variability in rater scoring that resists rater training (see Hoyt & Kerns, 1999; McNamara, 1996; Weigle, 2002; Weir, 2005). The consideration of individual differences in rater cognition may explain some of this rater variability. This mixed-method exploratory case study (Yin, 2009) examined rater decision making in a high-stakes writing assessment for preservice teachers in Quebec, Canada, focussing on individual differences in decision-making style, or 'stylistic differences in cognitive style that could affect decision-making' (Thunholm, 2004). The General Decision Making Style Inventory questionnaire (Scott & Bruce, 1995) was administered to six raters of a high-stakes writing exam in Quebec, and information on the following rater behaviours was also collected for their potential for providing additional information on individual decision-making style (DMS): (a) the frequency at which a rater decides to defer his or her score, (b) the underuse of failing score levels, and (c) the comments provided by raters during the exam rating about their decisions (collected through 'write-aloud' protocols; Gigerenzer & Hoffrage, 1995). The relative merits of each of these sources of data are discussed in terms of their potential for tapping into the construct of rater DMS. Although score effects of DMS have yet to be established, it is concluded that despite the exploratory nature of this study, there is potential for the consideration of individual sociocognitive differences in accounting for some rater variability in scoring. Adapted from the source document
关键词:applied linguistics, writing instruction, acquisition, processes, and testing, Individual Differences, Writing Tests, Test Validity and Reliability, Cognitive Processes, Quebec, Rating Scales
- Eckes, Thomas. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9, 270-292.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This research investigated the relation between rater cognition and rater behavior, focusing on differential severity/leniency regarding criteria. Participants comprised a sample of 18 raters examined in a previous rater cognition study (Eckes, 2008). These raters, who were known to differ widely in their perceptions of criterion importance, provided ratings of live examinee writing performance. Based on these ratings, criterion-related bias measures were estimated using many-facet Rasch measurement. A cluster analysis of bias measures yielded four operational rater types. Each type was characterized by a distinct pattern of differentially severe or lenient ratings on particular criteria. The observed bias patterns were related to differential perceptions of criterion importance: Criteria perceived as highly important were more closely associated with severe ratings, and criteria perceived as less important were more closely associated with lenient ratings. Implications of the demonstrated link between rater cognition and rater behavior for future research into the nature of rater bias are discussed. Adapted from the source document
关键词:applied linguistics, writing instruction, acquisition, processes, and testing, Cognitive Processes, Writing Tests, Rating Scales, Test Validity and Reliability
- Elder, C., Barber, M., Staples, M., Osborne, R. H., Clerehan, R. & Buchbinder, R. (2012). Assessing health literacy: A new domain for collaboration between language testers and health professionals. Language Assessment Quarterly, 9, 205-224.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Health literacy, defined as an individual's capacity to process health information in order to make appropriate health decisions, is the focus of increasing attention in medical fields due to growing awareness that suboptimal health literacy is associated with poorer health outcomes. To explore this issue, a number of instruments, reported to have high internal consistency and strong correlations with general literacy tests, have been developed. However, their validity as measures of the target construct is seldom explored using multiple sources of evidence. The current study, involving collaboration between health professionals and language specialists, set out to assess the validity of the Rapid Estimate of Adult Literacy in Medicine (REALM), which describes itself as a 'reading recognition' test that measures ability to pronounce common medical and lay terms. Drawing on a sample of 310 respondents, including both native and non-native speakers of English, investigations were undertaken to probe the REALM's validity as a measure of understanding the selected terms and to consider associations between scores on this widely used test and those derived from other recognized health literacy tests. Results suggest that the REALM is underrepresenting the health literacy construct and that the test may also be biased against non-native speakers of English. The study points to an expanded role for language testers, working in collaboration with experts from medical disciplines, in developing and evaluating health literacy tools. Adapted from the source document
关键词:applied linguistics, language testing and assessment, applied linguistics, adult language development/literacy studies, Adult Literacy, Health Care Practitioners, Language Tests, Test Validity and Reliability, Reading Tests
- Gan, Zhengdong. (2012). Complexity measures, task type, and analytic evaluations of speaking proficiency in a school-based assessment context. Language Assessment Quarterly, 9, 133-151.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This study, which is part of a large-scale study of using objective measures to validate assessment rating scales and assessment tasks in a high-profile school-based assessment initiative in Hong Kong, examined how grammatical complexity measures relate to task type and analytic evaluations of students' speaking proficiency in a classroom-based assessment context. An in-depth analysis of oral performance on two different assessment tasks (i.e., monologic vs. interactive) from 30 English as a Second Language, Cantonese-mother-tongue, secondary school students was conducted using a range of measures of grammatical complexity derived from the previous second language (L2) speaking and writing studies. Results showed that the individual presentation task tended to promote not only a greater number of T-units, clauses, verb phrases, and words but also longer T-units and utterances, thus probably stretching learners more in terms of complexity of grammatical and lexical processing. Results also showed that complexity measures recommended as among the most useful complexity measures demonstrated no significant correlations with analytic ratings of learner speaking proficiency. These findings were then discussed in light of the complex, dynamic, and developmental nature of grammatical complexity as well as in light of a learner-, task-, and L2 form-sensitive account of L2 oral production. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Hong Kong, Cantonese, English as a Second Language Tests, Test Validity and Reliability, Oral Language, Complexity, Secondary School Students
- Gui, Min. (2012). Exploring differences between Chinese and American EFL teachers' evaluations of speech performance. Language Assessment Quarterly, 9, 186-203.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This study explored whether American and Chinese English as a Foreign Language (EFL) teachers differ in their evaluations of student oral performance by examining the assessments of two groups of raters in an undergraduate speech competition. Each of the 21 contestants presented a 3-min prepared speech on a required topic, responded to a follow-up question, and gave a 1-min impromptu speech on a new topic. Three Chinese and three American EFL teachers rated the speech performances and recorded their comments for the individual contestants as well as for the contestants as a group. Immediately following the competition, the researcher interviewed the raters. The results revealed that American and Chinese EFL raters showed a high degree of agreement on the competition winners and the scores for the contestants. Qualitatively, however, the raters differed in their comments about the students' pronunciation, usage of English expressions, and speech delivery. The Chinese raters unanimously offered positive comments in these three areas, whereas the American raters gave varied and extensive critical comments. These results suggest a need for increased communication between Chinese and American EFL teachers, especially regarding their perceptions of what constitutes good English speech and their pedagogical priorities for oral English instruction. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Second Language Teachers, Teacher Attitudes, Speech Tests, English as a Second Language Tests, Test Validity and Reliability
- Kang, Okim. (2012). Impact of rater characteristics and prosodic features of speaker accentedness on ratings of international teaching assistants' oral performance. Language Assessment Quarterly, 9, 249-269.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Few prior studies have examined degree of fidelity between raters' assessments of oral performances and objectively observable prosodic indices of accentedness. Prosodic indices of accentedness quantify trait-relevant variance, whereas rater background variables represent trait-irrelevant variance. The present study, therefore, investigated the extent to which raters' background characteristics and instrumentally measured prosodic indices of speakers' accentedness jointly influenced the rating of oral performances. Seventy U.S. undergraduate students rated the speaking and teaching proficiency of 11 international teaching assistants (ITAs). Using the PRAAT computer program, 5 min of continuous speech from each of the ITAs were instrumentally analyzed for a number of indices of speech rate, pausing, stress, and intonation. Dependent variables were undergraduates' ratings of ITA oral proficiency and instructional competence. Rater background variables such as the listener's native speaker status and experience as a language tutor explained 7-9% of the variance in oral performance ratings, whereas 18-19% was attributable to the prosody variables. These findings suggest that U.S. undergraduates are sensitive to trait-relevant indicators of ITA oral proficiency. At the same time, their speech evaluations are subject to substantial bias based on their own backgrounds. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Prosodic Features, Foreign Accent, Teachers, Stress, Second Language Tests, Test Validity and Reliability, Speech Rate, Pauses, Intonation
- Li, Hongli, & Suen, Hoi K. (2012). Are test accommodations for English language learners fair?. Language Assessment Quarterly, 9, 293-309.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Test accommodations have been proposed to help overcome the unfair challenges faced by English Language Learners (ELLs) due to their relatively low English proficiency. A test accommodation is regarded as effective when it improves the test performance of ELLs. However, this improvement raises the question of whether such accommodations give ELLs an unfair advantage. One criterion used in determining a test accommodation's fairness is that it should only remove the disadvantage that ELLs face in regard to their low language proficiency, without giving ELLs any additional advantages. This criterion is met when the test accommodation does not improve the test performance of the non-ELLs when the same accommodation is applied to them. To determine the fairness and, thus, the validity of test accommodations for ELLs, a meta-analysis using hierarchical linear modeling was conducted to compare the effects of test accommodations on the test performance of ELLs and on that of non-ELLs. The results indicated that test accommodations improved ELLs' test performance by about 0.156 standard deviation units but did not discernibly influence the test performance of non-ELLs. This meta-analysis, therefore, constitutes evidence to support the fairness and validity of providing test accommodations for ELLs. Adapted from the source document
关键词:applied linguistics, language testing and assessment, English as a Second Language Tests, Limited English Proficiency, Test Validity and Reliability
- Pae, Hye K., Greenberg, D., & Morris, Robin D. (2012). Construct validity and measurement invariance of the Peabody Picture Vocabulary Test-III Form A. Language Assessment Quarterly, 9, 152-171.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:The aim of this study was to apply the Rasch model to an analysis of the psychometric properties of the Peabody Picture Vocabulary Test-III Form A (PPVT-IIIA) items with struggling adult readers. The PPVT-IIIA was administered to 229 African American adults whose isolated word reading skills were between third and fifth grades. Conformity of the adults' performance on the PPVT-IIIA items was evaluated using the Winsteps software. Analysis of all PPVT-IIIA items combined did not fully support its use as a useful measure of receptive vocabulary for struggling adult readers who were African Americans. To achieve an adequate model fit, Items 73 through 156 were analyzed. The items analyzed showed adequate internal consistency reliability, unidimensionality, and freedom from differential item functioning for ability, gender, and age, with a minor modification. With an appropriate treatment of misfit items, the results supported the measurement properties, internal consistency reliability, unidimensionality of the PPVT-IIIA items, and measurement invariance of the test across subgroups of ability, age, and gender. Adapted from the source document
关键词:applied linguistics, language testing and assessment, applied linguistics, adult language development/literacy studies, Adults, Receptive Language, Adult Literacy, Test Validity and Reliability, Reading Deficiencies, Black Americans, Peabody Picture Vocabulary Test
- Walters, Jodee. (2012). Aspects of validity of a test of productive vocabulary: Lex30. Language Assessment Quarterly, 9, 172-185.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This study investigates aspects of validity of an alternative measure of productive vocabulary. Lex30, developed by Meara and Fitzpatrick, is a word association task that claims to give an indication of productive vocabulary knowledge. Previous studies of Lex30 have assessed test-retest reliability, performance against native speaker norms, concurrent validity, reliability of parallel forms, and ability to reflect improvements in vocabulary development. In addition, the issue of construct validity has been explored. The study described here replicates some of these investigations with a different population and extends the investigation of construct validity. By comparing the performance of second language (L2) learners at different proficiency levels, the ability of the test to distinguish between levels of proficiency is explored. Concurrent validity is explored by comparing L2 learners' performance on Lex30 with that of two other productive vocabulary tests. Finally, one aspect of construct validity is explored by assessing whether Lex30 measures productive vocabulary use or simply recall. The findings indicate that Lex30 is a reliable and valid measure of productive vocabulary knowledge, but whether it measures only recall, or whether it measures actual ability to use vocabulary meaningfully and appropriately, appears to depend on the proficiency level of the test taker. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Test Validity and Reliability, Second Language Tests, Vocabulary
- Wu, Jessica R. W. (2012). GEPT and English language teaching and testing in Taiwan. Language Assessment Quarterly, 9, 11-25.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:The General English Proficiency Test (GEPT) is a 5-level, criterion-referenced English as a Foreign Language (EFL) testing system implemented in Taiwan to assess the general English proficiency of EFL learners. In 1999, with the aim of encouraging the general study of English and to result in beneficial washback effects on the teaching and learning of English, the Ministry of Education lent its support to the Language Training and Testing Center in the development of the GEPT. Throughout a decade of efforts, the GEPT has won popular recognition in Taiwan. To date, more than 4.3 million Taiwanese have taken the test. This article first documents the evolution of the GEPT from the perspectives of test development and validation. The article then provides an overview of how GEPT scores are used in both educational and professional domains and discusses several key issues and problems that have emerged due to the new context introduced by the GEPT. Finally, the article outlines how the GEPT will address the challenges it faces in the years to come. Adapted from the source document
关键词:applied linguistics, language testing and assessment, English as a Second Language Tests, Taiwan, Test Validity and Reliability
- Yin, Muchun. (2012). Scratching where they itch: Evaluation of feedback on a diagnostic English grammar test for Taiwanese university students. Language Assessment Quarterly, 9, 78-104.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Feedback to the test taker is a defining characteristic of diagnostic language testing (Alderson, 2005). This article reports on a study that investigated how much and in what ways students at a Taiwan university perceived the feedback to be useful on an online multiple-choice diagnostic English grammar test, both in general and by students of higher and lower language proficiency. Stage 1 involved questionnaire data from 68 students who rated each item's feedback according to usefulness, and Stage 2 involved interviews with five students as they read the feedback after taking the test. The data from these two stages showed students' overall positive attitude toward the feedback and students' preferences for particular feedback characteristics. The study also found that although higher proficiency test takers found the feedback to be more useful than lower proficiency test-takers, views about the characteristics of good feedback were similar regardless of level. Recommendations for improving diagnostic language test construction and validation are discussed based upon the findings. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Feedback, English as a Second Language Tests, Student Attitudes, Taiwan, College Students, Test Validity and Reliability
- Filipi, A. (2012). Do questions written in the target language make foreign language listening comprehension tests more difficult?. Language Testing, 29(4), 511-532.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:The Assessment of Language Competence (ALC) certificates is an annual, international testing program developed by the Australian Council for Educational Research to test the listening and reading comprehension skills of lower to middle year levels of secondary school. The tests are developed for three levels in French, German, Italian and Japanese, and at two levels in Chinese and Indonesian. There is a mixture of target language and English questions in the Level 2 and 3 tests. Some teachers have raised this as a concern in the belief that all questions should only be offered in English for the sake of fairness. Their view is that the tests are unduly difficult when they are designed with questions in the language. Arising from this concern, the aim of the research to be reported in this paper was to investigate the effects of the language of the question on student performance. We drew on data from a trial test and a final listening test, a questionnaire administered to students to gauge their perceptions of the tests and an examination of public documentation about the ALC. For the statistical analysis, we used Item Response Theory for calibrating items and for comparing item difficulty estimates, and fit statistics to verify how well items with English and target language question formats worked together. We found that where the questions involved listening for simple, explicitly stated information, students found the items in the target language relatively easier. In questions that required students to listen for global meaning, language choice either did not matter or tended to favour items in English rather than the target language. Furthermore, each of the six tests with some items in English and others in the target language showed a high level of reliability and fit to the single latent scale, indicating that items were functioning consistently regardless of the language of the test question. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Second Language Tests, Australia, Listening Comprehension, Reading Comprehension, Secondary Education, Test Validity and Reliability
- McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 555-576.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This paper examines the uptake of Rasch measurement in Language Testing through a consideration of research published in Language Testing research journals in the period 1984 to 2009. Following the publication of the first papers on this topic, exploring the potential of the simple Rasch model for the analysis of dichotomous language test data, a debate ensued as to the assumptions of the theory, and the place of the model both within Item Response Theory (IRT) more generally and as appropriate for the analysis of language test data in particular. It seemed for some time that the reservations expressed about the use of the Rasch model might prevail. Gradually, however, the relevance of the analyses made possible by multi-faceted Rasch measurement to address validity issues within performance-based communicative language assessments overcame Language Testing researchers' initial resistance. The paper outlines three periods in the uptake of Rasch measurement in the field, and discusses the research which characterized each period. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Language Tests, Test Validity and Reliability
- Frost, K., Elder, C., & Wigglesworth, G. (2012). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers' oral performances. Language Testing, 29(3), 345-369.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Performance on integrated tasks requires candidates to engage skills and strategies beyond language proficiency alone, in ways that can be difficult to define and measure for testing purposes. While it has been widely recognized that stimulus materials impact test performance, our understanding of the way in which test takers make use of these materials in their responses, particularly in the context of listening-speaking tasks, remains predominantly intuitive. Recent studies have highlighted the problems associated with content-related aspects of task fulfilment on integrated tasks, but little attempt has been made to operationalize the way in which content from the input material is integrated into speaking performances. Using discourse data from a trial administration of a pilot for an Oxford English language test, this paper investigates how test takers integrate stimulus materials into their speaking performances on an integrated listening-then-speaking summary task, whether these behaviours are reflected in the relevant rating scale and, by implication, whether the test scores assigned according to this scale reflect real differences in the quality of oral performances. An innovative discourse analytic approach was developed to analyse content-related aspects of performance in order to determine if such aspects represent an appropriate measure of the speaking ability construct. Results showed that the measures devised, such as the number of key points included from the input text, and the accuracy with which information was reproduced or reformulated, effectively distinguished participants according to their level of speaking proficiency. The study's findings support the use of this particular task-type and the appropriateness of the associated rating scale as a measure of speaking proficiency, as well as the utility of the devised discourse-based measures for the validation of integrated tasks in other assessment contexts. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Language Tests, Discourse Analysis, English Proficiency, Test Validity and Reliability, Speech Production, Oral Language
- Goodwin, Amanda P., Huggins, A. C., Carlo, M., Malabonga, V., Kenyon, D., Louguit, M., & August, D. (2012). Development and validation of extract the base: An English derivational morphology test for third through fifth grade monolingual students and Spanish-speaking English language learners. Language Testing, 29(2), 265-289.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This study describes the development and validation of the Extract the Base test (ETB), which assesses derivational morphological awareness. Scores on this test were validated for 580 monolingual students and 373 Spanish-speaking English language learners (ELLs) in third through fifth grade. As part of the validation of the internal structure, which involved using the Generalized Partial Credit Model for tests with polytomous items, items on this test were shown to provide information about students of different abilities and also discriminate amongst such heterogeneous students. As part of the validation of the test's relationship to criterion, items were shown to correlate with measures of word identification, reading comprehension, and vocabulary measures. Differences in performances for fluent English students and ELLs, students of varied home language environments, and different grade levels were noted. Additionally, the task was validated using a dichotomous scoring system to provide reliability and validity information using this alternate scoring method. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Elementary School Students, Test Validity and Reliability, English as a Second Language Tests, Morphology
- Haug, T. (2012). Methodological and theoretical issues in the adaptation of sign language tests: An example from the adaptation of a test to German Sign Language. Language Testing, 29(2), 181-201.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Despite the current need for reliable and valid test instruments in different countries in order to monitor the sign language acquisition of deaf children, very few tests are commercially available that offer strong evidence for their psychometric properties. This mirrors the current state of affairs for many sign languages, where very little research is available. No previous empirical study has focused explicitly on the linguistic, methodological, and theoretical issues involved in the process of adapting a test from a source sign language to a target sign language. Problems during the adaptation process can arise from linguistic differences between the source and the target language and differences in the source and the target cultures. Both are important aspects that need to be considered in the adaptation of a sign language test from a source to a target language. This study proposes a model for sign language test adaptation, based on the adaptation of the British Sign Language Receptive Skills Test to German Sign Language. The model includes different methodological steps, with a particular focus on construct validation. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Sign Language, Language Tests, German, Test Validity and Reliability
- Bridgeman, B., Powers, D., & Stone, E. (2012). TOEFL iBT speaking test scores as indicators of oral communicative language proficiency. Language Testing, 29(1), 91-108.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Scores assigned by trained raters and by an automated scoring system (SpeechRaterTM) on the speaking section of the TOEFL iBT(TM) were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language Internet-based test (TOEFL iBT). Oral communicative effectiveness was evaluated both by rating scales and by the ability of the undergraduate raters to answer multiple-choice questions that could be answered only if the spoken response was understood. Correlations of these communicative competence indicators from the undergraduate raters with speech scores were substantially higher for the scores provided by the professional TOEFL iBT raters than for the scores provided by SpeechRater. Results suggested that both expert raters and SpeechRater are evaluating aspects of communicative competence, but that SpeechRater fails to measure aspects of the construct that human raters can evaluate. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, English as a Second Language Tests, Test Validity and Reliability, Communicative Competence, Speech Tests
- Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple.... Language Testing, 29(1), 19-27.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:In this commentary, Chapelle responds to Michael Kane's (same journal issue) article "Validating score interpretations and uses: Messik Lecture, Language Testing Research Colloquium, Cambridge, April 2010." Chapelle elaborates on some issues that Kane's approach raises for language testing based on her experiences with interpretive arguments and validity arguments. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Language Tests, Test Validity and Reliability
- Davies, A. (2012). Kane, validity and soundness. Language Testing, 29(1), 37-42.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:In this commentary, Davies responds to Michael Kane's (same journal issue) article "Validating score interpretations and uses: Messik Lecture, Language Testing Research Colloquium, Cambridge, April 2010." Adapted from the source document
关键词:applied linguistics, language testing and assessment, Test Validity and Reliability, Language Tests